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© Program parallelizing apparatus. 



© A program parallelizing apparatus for generating from a source program (1) to be executed an object 
program which is capable of being processed in parallel by a plurality of processors (77) constituting a multi- 
processor (9) which includes a communication mechanism for allowing inter-processor communication and a 
synchronization mechanism for allowing the processings to proceed in parallel among the processor through 
coordination. Object programs susceptible to parallel processing by the multi-processor system with high 
efficiency can be generated at a high speed in correspondence to various source programs universaJly in 
dependent of the types of processors. f i g. i 
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PROGRAM PARALLELIZING APPARATUS 



BACKGROUND OF THE INVENTION 

The present invention relates to a program parallelizing apparatus for generating from a source program 
to be executed an object program which is susceptible to parallel processing by a multi-processor system 
5 including a plurality of processors, a communication mechanism for performing inter-processor communica- 
tions and a synchronizing mechanism for allowing the processing to proceed among the processor through 
coordination (wait) processing. 

Heretofore, in the case where parallel processing is to be performed in a multi-processor system, such 
a task scheduling is resorted to in which a program is divided into unit each of a large size such as jobs, , , u 

w sleds or the like, wherein it is prerequisite for the task scheduling to confirm beforehand by flag check or by /mi^X^^^ ; 
token control method that all the inter-task relations are satisfied. i L I/ *Q 

A technique relevant to this type processing system is disclosed, for example, in JP-A-63-1 84841 . tP^*^G^W°J^ Wf 

The prior art program parallelizing technique is disadvantageous in that since the program is divided Cwvci^H? ^ 
into units or segments of a large size for scheduling for parallel processing, it is difficult or even impossible r I k j 
15 to derive the parallelism which the program has inherently, making it difficult or impossible to realize the 

parallel processing effectively. \^ 

Besides, because the inter-task sequential relation is controlled or managed by the flag check or by the 
token control, lots of time is required for generating the parallelizing schedule, giving rise to another 
problem. 

20 



SUMMARY OF THE INVENTION 

It is therefore an object of the present invention to provide a program parallelizing apparatus for 
25 generating object programs at a high speed in correspondence to various source programs universally 
independent of types of processors incorporated in a multi-processor system which includes a communica- 
tion mechanism for performing communication among the constituent processors and a synchronizing 
mechanism for allowing the processing to proceed among the processors through coordination processing 
such as wait processing, so that the object programs can be processed in parallel with high efficiency in the 
30 multi-processor system. 

Another object of the present invention is to provide a program parallelizing apparatus which is capable 
of obtaining the result of task scheduling which can reduce overhead involved in the synchronization 
processing of the multi-processor system. 

It is still another object of the present invention to provide a program parallelizing apparatus which is 
as capable of assigning object programs to be executed to the individual processors, respectively, of the multi- 
processor system while reducing overhead involved in the synchronization processing by the multi- 
processor system. 

A further object of the present invention is to provide a program parallelizing apparatus which is capable 
of assigning the object codes to be executed to the individual processors of the multi-processor system 
40 such that the time taken for execution in the multi-processor system can substantially be minimized while 
taking into account overhead attributable to the parallel processing in the multi-processor system. 

It is a still further object of in present invention to provide a program parallelizing apparatus having a 
high universality so as to be compatible with a great variety of processors. 

It is yet another object of the present invention to provide a program parallelizing apparatus which is 
45 capable of finding out easily the inter-task relations without changing the structure of original program to 
thereby realize the scheduling of tasks at a high speed. 

According to a first aspect of the present invention, there is provided a program parallelizing apparatus 
for generating from a source program to be executed an object program susceptible to parallel processing 
by a multi-processor system which includes a plurality of processors, communication means for allowing 
so communication among the processors and synchronizing means for allowing the processing to proceed in 
parallel among the processors through coordination or wait processing. The program parallelizing apparatus 
comprises a task division means for dividing the source program to be executed into tasks to thereby 
generate the information of inter-task relation by checking operand data contained in the source program 
and determining those tasks execution of which is influenced by data resulting from execution of given 
tasks or the given tasks execution of which influences the execution of the former and the information of 
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task processing time by adding the times required for processing machine instructions which constitute the 
tasks, respectively. The program parallelizing apparatus further comprises a task scheduling means which 
responds to the task processing time information and the inter-task relation information outputted from the 
task division means for thereby generating groups of tasks to be executed by the processors, respectively, 

s of the multi-processor system as well as processing sequence information therefor such that the tasks 
capable of being executed in parallel in the multi-processor system are executed separately by the different 
processors, respectively, without involving condradiction in the inter-task sequential relation and generating 
synchronization information indicating time points at which the synchronizing means should perform the 
coordination or wait processing. There is further provided a parallel compiling means which has a function 

10 for defining the tasks generated by the task scheduling means and to be assigned to each of the 
processors of the multi-processor system and the processing sequence information therefor as object 
programs to be executed by the individual processors, respectively, in the multi-processor system and 
assigning the object programs to the processors. 

With the straucture or arrangement of the program parallelizing apparatus according to the invention 

is described above, it is possible to generate at a high speed from a source program to be executed an object 
program capable of being processed in parallel by a plurality of processors constituting a multi-processor 
which includes a communication mechanism for allowing inter-processor communication and a synchroniza- 
tion mechanism for allowing the processings to proceed in parallel among the processors through 
coordination. Thus, the object program susceptible to parallel processing in the multi-processor system with 

20 high efficiency can be generated at a high speed in correspondence to various source programs universally 
independent of the types of processors. 

In accordance with a second aspect of the present invention, it is proposed that the task scheduling 
means is imparted with a function for generating task link information for linking into one task those tasks 
which have undergone the task scheduling and which can decrease, when linked together, the number of 

25 times the coordination or wait processing is to be performed by the synchronization means and hence the 
time required for executing the parallel processing in the multi-processor system. 

By virtue of the feature of the invention described above, such task scheduling is realized which can 
reduce overhead involved in the synchronization processing by the multi-processor system. 

In accordance with a third aspect of the invention, it is proposed in conjunction with the second aspect 

30 mentioned above that the task scheduling means is imparted with a function for linking the tasks in 
accordance with the task link information. 

Thus, in the program parallelizing apparatus according to the third aspect of the present invention, it is 
possible to assign the object programs to be executed to the individual processors of the multi-processor 
system while reducing overhead involved in the synchronization processing by the multi-processor system. 

35 According to a fourth aspect of the invention, it is proposed that the task division means is imparted 
with a function for linking the tasks to thereby generate a new task in accordance with the task link 
information generated by the task scheduling means. 

By virtue of this arrangement, it is possible to assign the object codes to be executed to the individual 
processors of the multi-processor system such that the time taken for execution in the multi-processor 

40 system can substantially be minimized while taking into account overhead attributable to the parallel 
processing in the multi-processor system. 

Accorcing to a fifth aspect of the present invention, it is proposed that a precompiler is provided for 
translating the source program into a pre-object program including virtual machine codes expressing 
individual instructions for various processors in general terms. 

45 By virtue of this feature, the program parallelizing apparatus according to the invention can enjoy a high 
universality and is compatible with a great variety of processors. 

According to a sixth aspect of the invention, it is proposed that the virtual machine codes which can set 
the data expressed in the form of function in one-to-one correspondence are employed. 

Thus, it is possible according to the feature mentioned just above to find out easily the inter-task 

so relation without need for modification of the original program, whereby task scheduling can be accom- 
plished at a high speed, to futher adavantage. 

BRIEF DESCRIPTION OF THE DRAWINGS 

55 

Fig. 1 is a block diagram showing schematically a general arrangement of a program parallelizing 
apparatus acqcording to a first exemplary embodiment of the invention; 

Fig. 2 is a view for illustrating an example of expression in a function form; 
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Fig. 3 is a view showing an example of a task registration table employed in the program parallelizing 
apparatus; 

Fig. 4 is a view showing an example of a virtual machine code / real machine code collation table; 
Fig. 5 is is a chart for illustrating an inter-task sequential relation; 
5 Fig. 6 is a view for illustrating an example of a task relation table; 

Fig. 7 is a diagram showing an example of a task relation graph; 

Fig. 8 is a diagram showing an example of the initial positions of tasks in a parallelising schedule; 
Fig. 9 is a view showing an example of a level file; 

Fig. 10 is a diagram showing an example of the parallelizing schedule in which simultaneous 
w synchronization is employed; 

Fig. 1 1 is a diagram showing an example of the parallelizing schedule in which a team synchroniza- 
tion is employed; 

Rgs. 12 and 13 are views for illustrating, respectively, examples of task move capable of decreasing 
the numbers of processors to be used; 
75 Rgs. 14 and 15 are diagrams showing, respectively, further examples of the task relation graph; 

Rgs. 16 and 17 are diagrams showing, respectively, examples of the parallelizing schedule in which 
simultaneous synchronization is employed; 

Rg. 18 is a diagram for illustrating, by way of example, the conditions for task linking: 

Rg. 1 9 is a view showing an example of a processor file; 
20 Rg. 20 is a flow chart for illustrating an example of algorithm for the task linkage; 

Rg. 21 is a view showing an example of a task link file; 

Rg. 22 is a view showing an example of a level file; 

Rgs. 23, 24 and 25 are views showing, respectively, further examples of the processor file; 
Rg. 26 is a schematic block diagram showing a general arrangement of the program parallelizing 
25 apparatus according to a second embodiment of the invention; 

Fig. 27 is a diagram showing a third embodiment of the present invention; 

Rg. 28 is a block diagram showing a general arrangement of the program parallelizing apparatus 
according to a fourth embodiment of the invention; and 

Rg. 29 is a schematic diagram showing a structure of a processor (77-82) incorporated in a target 
30 machine. 

DESCRIPTION OF THE PREFERRED EMBODIMENTS 

In the following, the present invention will be described in detail in conjunction with preferred or 
35 exemplary embodiments thereof by reference to the accompanying drawings. 

Rrst Embodiment 

40 Rg. 1 is a block diagram showing schematically a general arrangement of a program parallelizing 
apparatus according to a first exemplary embodiment of the invention. Referring to the figure, the program 
parallelizing apparatus denoted generally by a numeral 10 is composed of a precompiling means 2, nucleus 
system routines 3, a task division or segmentation means 4. a scheduling means 5 and a parallel compiling 
means 6. 

45 A parallel processing system (multi-processor system) 9 also referred to as the target machine adapted 
to execute objective codes and sequences which are now capable of being processed in parallel owing to 
the program parallelizing apparatus 10 according to the invention includes a plurality of processors, an inter- 
processor synchronizing mechanism or means for allowing the parallel processings to proceed in parallel 
among the individual processors without involving confusion or contradiction in the inter-task sequential 

so relation by checking the ends of the tasks as executed and a communication mechanism or means for 
ensuring positively data transactions or transfers among the individual processors. 

As an example of the inter-processor synchronizing mechanism, there may be employed an inter- 
processor synchronizing apparatus disclosed in JP-A-63-45670 (Japanese Patent Application Laid-Open No. 
4567071988). On the other hand, the communication mechanism may be implemented, for example, by 

55 using a shared memory which permits access thereto from given processors at random at a high speed and 
with a high efficiency. 

The shared memory arrangement mentioned above may be realized by controlling a bus system 
connecting the individual processors to the shared memory with the aid of such a high-speednigh- 
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performance signal control circuit as disclosed in JP-A-62-1 54057 or JP-A-63-47866, for example. 

A typical structure of the target machine 9 is shown in Fig. 28. In the case of the instant (first) 
embodiment of the invention, arrangement is made such that the data used in common by a plurality of 
processors of the target machine 9 is stored in the shared memories, while the data used only by a 

5 particular one of the processors is placed in a local memory or register of that particular one processor. 

In order to make it possible to process a source program 1 in parallel by the multi-processor system or 
target machine 9, the program parallelizing apparatus 10 according to the instant embodiment operates in 
such a manner as described below. 

The precompiling means 2 translates a source program 1 to be executed into virtual machine codes 

70 prescribed by the nucleus system routine 3, which codes are then supplied to the task division means 4. 
With the phrase "virtual machine code", it is intended to mean a machine code which expresses in general 
terms a processor instruction and which can satisfies the conditions necessary for the parallelization. More 
specifically, for satisfying the conditions necessary for the parallelization, there is employed in the case of 
the instant embodiment such virtual machine code which has the capability of translating the data 

75 expressed in the form of functions in one-to-one correspondence. As a method of translating a program to 
the virtual machine codes, there is such a method as mentioned below by way of example. A table 
indicating correspondences between statements describing a source program and the nucleus system 
routines is prepared. This table will hereinafter be referred to as the SN table. The precompiling means 2 
translates the source program 1 to the virtual machine codes with the aid of the SN table and the nucleus 

20 system routines. 

The SN table may be incorporated, for example, in the precompiling means 2. The nucleus system 
routines are constituted by the virtual machine codes for performing given processings, respectively. A set 
of such nucleus system routines will be referred to as a nucleus language. 

In this way, the precompiling means 2 translates the source program 1 into the virtual machine codes 
25 which are then sent to the task division means 4. 

In the task division means 4, each of the virtual machine codes supplied from the precompiling means 
2 is segmented or separated to units termed the task for generating the data or information concerning the 
processing time required for the processing of the task and the sequential (preceding/succeeding) relations 
among the tasks, i.e. the inter-task sequential relations, which may be defined as follows. Let's suppose that 
30 the execution of a certain task Q requires indispensably the result of execution of another task P. It is then 
apparent that the execution of the task P has to be completed before the execution of the task Q can be 
started. In that case, it is said that the inter-task sequential relation exists between the task P and the task 
Q, wherein the task P is termed as the ascendent task for the task Q, while the task Q is referred to as the 
descendent task relative to the task P. 
35 The inter-task sequential relations can be indicated by arrows, as illustrated in Fig. 5. The diagram 
showing the inter-task sequential relation will hereinafter be referred to as the inter-task relation graph. 

The task division means 4 can prepare the table indicating the inter-task sequential relations (task 
relation table) such as shown in Fig. 6 on a case-by-case basis as follows. 

40 

(1) In case a source program 1 is described in the form of function: 

Since the virtual machine code outputted from the precompiling means 2 is in the form of function, the 
task A which outputs the data required necessarily for executing each of tasks T constitutes the ascendent 
45 task for the latter. Accordingly, on the basis of this relation, the inter-task relation table can be generated. 

(2) In case the source program 1 is described in a non-function form: 

so a) Unless program rewriting is effected at the source program level, a table indicating correspondences 
between the statements of the source program 1 and corresponding function forms therefor may be 
provided for the precompiling means 2, wherein the table content and the precompiled virtual machine 
codes are sent to the task division means 4. Thus, the inter-task relation table indicating the ascendent task 
A and the descendent tasks T requiring for the execution thereof the data resulting from the execution of 

55 the task A can be generated through cooperation of the precompiling means 2 and the task division means 
4. 

b) In contrast, when program rewriting is effected at the source program level, the source program is 
translated into a function form at the level of task module which can be regarded as a minimum unit as 
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described below, so that the instant case can be reduced to the abovementioned case (1 ). 

i) In case the source program uses only the memory variables (without using register variables), this 
corresponds to a source program which is constituted only by the memory operations without resorting to 
the register operation. When the shared memory is selected as the abovementioned memory, data in the 

s shared memory can be accessed by any given one of the processors. As the method for translating the 
source program into the function form, there may be mentioned a method according to which different 
variables and constants are not identified with a same name. An example of this method is illustrated in Fig. 
2. Referring to the figure, "a = b + c tt means that "(b + c) is executed, whereon the result of the 
execution is substituted for a". As an algorithm for the translation to the function form in this manner, there 

w may be mentioned such a translation algorithm as described below. 

When a character (or a character string) representing a name of a same variable or constant makes 
appearance m times (m 2; 2) in the description at the left-hand side to the equality sign, the left-hand side of 
the i-th expression (where 2 S i £ m) is translated into a character (or character string) N, which differs from 
the character (or character string) Ni representing the variable or constant name of the first occurrence. 

/s whereon in the expressions including the next one (i + 1) to the expression inlcusive thereof in which the 
same character or character string Nj makes appearance in the left-hand side to the equality sign, the same 
character string in the right-hand sides are rewritten by the same character string N,. By repeating the 
above procedure for each i given by 2 £ i £ m, the source program is transformed to the function form, 
which can thus be processed similarly to the abovementioned case (1). 

20 ii) In case the source program uses register variables, either one of the undermentioned method (A) or 
(B) is adopted. 

A) All register operations are translated into memory operations, which can thus be reduced to the 
aforementioned case (2)-b)-i). 

B) A module is established as the minimum unit of task, which can be loaded from a memory and 
25 stored in a memory. To this end. the source program is divided or segmented so as to be compatible with 

the modules. When the simple division of the source program is insufficient, memory load instruction as 
well as memory store instruction may be added and inserted. Thus, this case can be reduced to the 
aforementioned case (2)-b)-i)- 

As the task segmenting or dividing method which can be adopted in conjunction with the task division 

30 means 4, there may be mentioned, by way of example, a method which is described below. 

The source program 1 is translated to the virtual machine codes by the precompiling means 2. The task 
division means 4 translates the virtual machine codes developed up to the singular virtual machine code 
units into tasks such that one virtual machine code constitutes or corresponds to one task, wherein the 
contents of the tasks (i.e. the virtual machine codes contained in the tasks) are registered in a task 

35 registration table, which contains the task identification numbers (Nos.). the virtual machine codes contained 
in the individual tasks, and the task processing times, as is shown in Fig. 3. Subsequently, by consulting a 
virtual machine code/real machine code collation table shown in Fig. 4 and by referring to the column of the 
real processor of the target machine which is to be used, the processing time required for executing the 
task of concern by that processor is registered as the task processing time in the task registration table at 

40 the column labeled "task processing time**. The virtual machine code/real machine code collation table 
indicates the machine codes of the real processors corresponding to the virtual machine codes and the 
respective processing times on a virtual machine code basis for each of the real processors. In the task 
registration table shown in Fig. 3. it is assumed, by way of example, that the real processor of concern is a 
processor A. 

45 As to the method of generating the task relation table by the task division means 4, description has 
already been made. Fig. 6 shows a task-relation table for a group of tasks having a task relation graph 
illustrated in Fig. 7. 

Referring to Fig. 7, numerals each shown in a circle 11 represent the task identification numbers, 
respectively, and numerals each affixed to the circle 1 1 represent the times taken for the processings of the 
so associated tasks, and arrows 13 indicate the inter-task sequential relations defined hereinbefore. In Fig. 6. 
the inter-task relations are shown by indicating the ascendent task(s) to each of the tasks. It is however 
apparent that the task relation table may also be implemented by indicating the descendent task(s) to each 
of the tasks. Alternatively, both the ascendent tasks and descendent tasks may be registered in the task 
relation table. 

55 As will now be appreciated from the above, the task division means 4 segments or divides a group of 
the virtual machine codes outputted from the precompiling means 2 to thereby generate the task 
registration table indicating the contents of the individual tasks and the processing times taken for executing 
them, respectively, as well as the task relation table indicating the inter-task sequential relations, whereby 
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information of the task processing time and that of the inter-task sequential relation can be supplied to the 
task scheduling means 5 from the task division means 4. 

According to the task division method adopted in the instant embodiment of the invention, one task 
comprises one virtual machine code. However, the invention is never limited to such task division method. 

s For example, one or plural statements before the precompilation may comprise one task, wherein the virtual 
machine code or codes resulting from the precompilation or translation of the statement(s) may be 
registered as one task in the task registration table or alternatively one task may comprise a plurality of 
machine codes. In these cases, the processing time for each task as well as the inter-task sequential 
relation can equally be generated by the methods described previously. 

10 The task scheduling means 5 serves for scheduling the group of tasks supplied thereto while 
maintaining the inter-task sequential relations to thereby parallelize the tasks such that the entire processing 
time taken for execution of all the tasks can approach to a minimum or shortest processing time (i.e. the 
processing time scaled along the critical path) in the parallel processing performed in the target machine 9 
under the given conditions for the task, the task processing time and the inter-task sequential relation. 

75 For realizing such task scheduling means, there may be employed, for exampgle, such an algorithm as 
mentioned below. 

At first, the tasks are initially so positioned as to be processed as rapidly as possible (i.e. immediately 
when the conditions permitting the task processing have been met) while taking into consideration the inter- 
task sequential relation as given. Fig. 8 is a diagram showing the result of the initial positioning of the tasks 

20 having the inter-task sequential relation and the task processing times shown in Fig. 7. Referring to Fig. 8, 
each of rectangles 200 represents a task, and lines 201 indicate synchronized processing. With the phrase 
"synchronized processing", it is intended to mean such wait or coordination processing performed in the 
course of the parallel processing in which the tasks being, processed by the individual processors are forced 
to wait for the next processing until the tasks executed in parallel by the processors simultaneously have 

25 been all completed and in which the processors are allowed to proceed to the next processing only after 
completion of the task processings performed by all the processors participating in the parallel processing 
has been confirmed. For convenience of description, each of the synchronized processing is referred to as 
the level, wherein the level executed in precedence is termed the level of a high rank with the level 
executed in succession is referred to as the level of a low rank. 

30 According to the instant embodiment of the invention, the levels defined above are affixed with the level 

identification numbers sequentially on the time base such as level 1, level 2, level 3 level n, as shown in 

Fig. 8. wherein the level 1 is of the highest rank with the level n being of the lowest rank. Thus, the tasks 
placed at a same level are susceptible to the parallel process! ng.~When there exists an ascendent task A for 
a task B to be executed at the level n. the ascendent task A must naturally lie at the level higher than the 

35 level n. 

The initial placement can be realized by resorting to an algorithm which will be described below. 
At first, the initial value of the level number i is set to 1 (one), whereon the task(s) having the task 0 as 
the ascendent task is registered in a level file shown in Fig. 9 at the level [ (= 1). Upon placement or 
positioning of the task and registration thereof, the tasks are arrayed sequentially in such order that the task 
40 of the longest processing time is located at the leftmost position on any row. 

Next, for the other tasks than those registered at the levels 1 to i, the presence of the ascendent tasks 
for them are checked. When these ascendent tasks are all at the levels 1 to i, they are registered at the 
level (i + 1). Subsequently, the level number i is incremented to (i + 1), which is then followed by the 
registration of the succeeding level in accordance with the algorithm. When there exist no tasks which are 
45 not registered at the levels (i.e. when all the tasks have been registered in the level registration table), the 
algorithm or procedure comes to an end. 

In the above, an example of the algorithm for the initial task placement has been described. Since the 
result of the initial task placement can not necessarily reduce the entire processing time to a possible 
minimum, a further reduction of the entire or overall processing time is attempted. To this end, an algorithm 
so mentioned below by way of example may be employed. 

The levels at which the tasks are initially placed are defined as the first level (of the highest rank) to n- 
th level (of the lowest rank), wherein the time taken for processing the leftmost task T n of those belonging 
to a given one i of the levels (i.e. the maximum task at the given level i) is represented by ST1 with the 
processing time" for the second task T K from the left being represented by ST2. Further, the time taken for 
55 processing the maximum task T n belonging to the level j of a lower rank than the level i is represented by 
DT1. 

In the first place, by setting i equal to (n - 1). it is checked whether or not the leftmost task T n of those 
belonging to the level i can be moved to any one of the levels which are of lower rank than the level (i + 1) 
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and inclusive of the level n while satisfying the condition for the inter-task sequential relation. Let's assume 
that the task T ;1 can be moved without disturbing the inter-task sequential relation. In that case, the 
destination level to which the task T n is to be moved is represented by j, whereon E given by the 
undermentioned expressions are evaluated, 
s E = DT1 - ST2, when ST1 > DT1 
E = ST1 - ST2. when ST1 £ DTI. 

in case E £ 0, it is determined that the task T h can be moved from the level i whereon the task T it is 
moved to the level j at which E is maximum. On the other hand, when it is determined that the task Tj, can 
not be moved, then the procedure proceeds to a processing which is performed when the leftmost task of 

to the level i can not be moved to the level j, as described later on. In case a plurality of levels j exist at which 
E becomes maximum, the task T (1 is then moved to the level of as low a rank as possible, and similar 
manipulation is performed for the task which assumes the leftmost position at the level i as the result of the 
move mentioned above. When the leftmost task at the level i can no more be moved to the level j. the 
remaining tasks of the level i are moved in the order starting from the task requiring the maximum 

is processing time to the levels as low as possible so far as the inter-task sequential relation permits on the 
condition that the overall processing time is not thereby increased. When the move is impermissible, the 
task is left at the initial level. 

When the decision as to the movabiiity has been completed for all the tasks, then i is set to (n - 2), 
whereon the procedure described above is repeated, being followed by the similar procedures sequentially 

20 by setting i to (n - 3), (n - 4) 1 (one), respectively. 

Through the manipulation or procedure described above, all the tasks can be moved to as low levels as 
possible while maintaining the inter-task sequential relation on the condition that the overall processing time 
can be shortened or at least remain invariable from that taken in the initial placement of the tasks. This 
procedure is referred to as the downward sweep algorithm. Subsequently, all the tasks are moved to the 

25 levels of as high ranks as possible on the conditions that the inter-task sequential relation can be 
maintained and that the overall processing time can be shortened or at least remain invariable when 
compared with the result of the scheduling performed by the first downward sweep algorithm. This 
procedure is called the upward sweep algorithm. Since the upward sweep algorithm differs from the 
downward sweep algorithm only in respect to the direction in which the tasks are moved, the former can be 

30 realized by the algorithm similar to the downward sweep algorithm with the task move direction being 
changed to the upward direction. By executing the downward sweep algorithm alternately with the upward 
sweep algorithm, the overall processing time can progressively approach to the processing time for the 
critical path. By completing the execution of the upward sweep algorithm and the downward sweep 
algorithm at the time when the overall processing time can no more be shortened by either the upward or 

35 downward sweep algorithm, there can be attained the quasi-optimal task scheduling which can minimize the 
entire processing time while satisfying the conditions given for the tasks, the inter-task sequential relation 
and the task processing time. Fig. 10 is a diagram showing the result of the scheduling realized by moving 
the task from the task positions shown in Fig. 8 in accordance with the algorithms mentioned above. As can 
be seen from Fig. 10, the entire processing time can be shortened when compared with the initial 

40 scheduling shown in Fig. 8. 

The abovementioned scheduling is performed on the conditions that all the processors are synchro- 
nized simultaneously and thus referred to as the simultaneously synchronized scheduling. 

On the other hand, such arrangement may also be adopted in which teams are each constituted by 
several ones of the processors, wherein the relevant tasks are processed on a team basis with the 

45 processors within the team being synchronized (intra-team synchronization) while synchronization, is 
established among the teams, as occasion requires, which is referred to as the inter-team synchronization. 
Further, the scheduling scheme using the team synchronization is called the team synchronous scheduling. 

Rg. 11 is a diagram showing the result of the scheduling performed by using the team synchronization 
for the result of the scheduling based on the simultaneously synchronized scheduling. Referring to Fig. 1 1 . 

so the processors No.O and No.1 constitute a team A with the processors No.2 and No.3 constituting a team B. 
wherein during a period from a time point to to ti , the team A is in charge of the tasks 2. 5 and 1 with the 
team B being assigned with the tasks 3. 4, 6, 7, 8 and 9. and the intra-team synchronization is established 
for the teams A and B, separately, as indicated by lines 14 and 15 (intra-team synchronizations 14 and 15), 
respectively. The scheduling for allowing the overall processing time to approach to the critical path 

55 processing time as close as possible by using the team synchronization scheme in this manner is referred 
to as the team-synchronization based scheduling. As is apparent from the comparison of Figs. 10 and 11. 
the team-synchronization based scheduling can further reduce the overall processing time as compared 
with the simultaneously synchronized scheduling. For carrying out the team-synchronization based schedul- 



8 



EP 0 400 328 A1 



ing. an algorithm mentioned below may be employed after the initial task scheduling. Parenthetically, the 
initial task scheduling may be performed in accordance with the simultaneously synchronized scheduling 
mentioned above. In the description which follows, it is assumed that the tasks are present at all the levels 1 
to n as the result of the initial task scheduling. 

s "At first, the result of the initial scheduling is checked sequentially from the high level to the low level to 
determine first whether the critical path extends through the task of the longest processing time at a level i. 
When it is determined that the critical path extends through the task requiring the longest processing time 
at a given level, then the next level is checked similarly. If otherwise, the level i and the level (i + 1) are 
provisionally linked together, whereon those tasks which belong to the linked level and which bear the inter- 

w task sequential relation to one another are collected to thereby constitute a team, wherein for each of the 
teams, the task scheduling is performed in accordance with the simultaneously synchronized scheduling 
algorithm described previously. After the scheduling, the task requiring the longest processing time in each 
team is compared with the processing time of that task before constituting the team. When the comparison 
shows that the processing time is further reduced, that team is then registered, whereon requisite numbers 

75 of processors Nos. 0, 1, 2 and so forth are assigned to the teams sequentially, starting from the team 
registered first. Unless the processing time is diminished, a processing U described hereinafter is resorted 
to. When the processing time is improved, the procedure described above is "repeated for a team 
constituted newly by collecting the tasks belonging to the three simultaneously synchronized levels i, i + 1 
and i + 2 and interconnected through the inter-task sequential relation irrespective of the teams already 

20 registered. When the processing time is reduced, the next level is added to constitute a new team and the 
similar procedure for examination is repeated. When the processing time can no more be shortened, the 
procedure proceeds to the processing U where the team constituted in the manner mentioned above is 
sequentially added with the levels of higher rank to reconstitute the teams for diminishing the overall 
processing time. 

25 At the time when the overall processing time can no more be reduced, decision is again made for the 
lowest level of the reconstituted team as to whether or not the critical path extends through the task 
requiring the longest processing time, whereon the procedure described above is repeated in dependence 
on the result of the decision. 

The initial scheduling to be performed in precedence to the start of the team-synchronization based 

30 scheduling may be carried out in accordance with a method described below. 

At first, the simultaneously synchronized scheduling is performed. It is assumed that as the result of 
this simultaneously synchronized scheduling, the tasks are present at all the levels 1 to n. In that case, the 
initial value of the level number i is set to 1 (one), whereon the levels i and (i + 1) are provisionally linked 
to each other. By collecting those tasks belonging to the linked levels" which bear the inter-task sequential 

35 relation to one another, a task team is constituted, whereon the scheduling is performed for each team in 
accordance with the simultaneously synchronized scheduling algorithm described hereinbefore. The pro- 
cessing times required by the tasks in each team after the abovementioned scheduling are compared with 
those before the scheduling. When the comparison shows that the processing time can be reduced, the 
corresponding team is registered. Thereafter, the requisite numbers of the processors Nos. 0, 1, 2 and so 

40 forth are sequentially assigned to the teams orderly, starting from the first registered team. Unless any 
improvement can be seen in the processing time after the abovementioned scheduling, then the level 
number i is incremented by one (i.e. i = i + 1), and the procedure mentioned above is repeated. On the 
other hand, when the processing time is improved, those tasks belonging to the three simultaneously 
synchronized levels i, i + 1 and i + 2 which bear the inter-task sequential relation to one another are 

45 collected to constitute a new team irrespective of the teams already registered, whereon the procedure 
described above is repeated. When the processing time can thus be reduced, the next level is added to 
reconstitute the team, whereon the similar examination procedure is performed. When the processing time 
can no more be reduced, the team existing at that time point is registered finally. Subsequently, for the 
succeeding level j, the level number i is set equal to j, whereon the abovementioned algorithm is repeatedly 

50 applied. Upon completion of the examination up to the level n, the initial scheduling comes to an end. 

As will now be appreciated from the above, the task scheduling means 5 which execute the scheduling 
algorithms based on the simultaneously synchronization or team synchronization scheme can minimize the 
overall processing time to thereby make the latter approach to the processing time scaled along the critical 
path under the conditions given for the task, the task processing time and the inter-task sequential relation 

55 as supplied from the task division means 4. As the result of this scheduling, there are generated a level file 
indicating the tasks belonging to the individual levels, a processor file indicating the tasks to be processed 
by the individual processors and the types of synchronizations in the order of the processings to be 
executed and a common variable file indicating the names of common variables and addresses thereof. Rg. 
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22 shows a level file corresponding to the result of scheduling described above by reference to Fig. 10. 
Further. Fig. 19 shows the corresponding processor file, in which reference symbol So represents 
simultaneous synchronization instructions or commands. It should be mentioned that when other synchro- 
nization instructions or commands such as a team synchronization command and the like are inserted, a 
processor file similar to that shown in Fig. 19 may be generated. 

Fig. 25 shows, by way of example, a processor file corresponding to the result of the scheduling based 
on the team synchronization illustrated in Fig. 11. Referring to Fig. 25, reference symbol Sc represents a 
simultaneous synchronization command, Si represents an intra-team synchronization command for a team 
including the processors Nos. 0 and 1, and S2 represents an intra-team synchronization command for a 
team including the processors Nos. 2 and 3. 

After the scheduling mentioned above, the task scheduling means 5 may move the task assigned to a 
certain processor at a given level to a level where there exists a processor assigned with no task so that the 
number of the processors to be used can be decreased to a possible minimum while preventing the entire 
processing time from being increased under the conditions permissible from the standpoint of the inter-task 
sequential relation, to thereby assign the task to the level where the processor is present which is not 
assigned with the task. By way of example, a task T assigned to the level (n + 1) where the processor No. 
4 is present may be moved to the level 3 where the processor No. 3 exists, as is illustrated in Fig. 12, to 
thereby decrease the number of the processors to be used without increasing the overall processing time. 

Further, a task assigned to a certain one of the processors may be linked to a task assigned to another 
processor to thereby decrease the number of the processors to be used while maintaining the inter-task 
sequential relation, so long as the overall processing time is not increased. 

By way of example, referring to Fig. 13, a task T assigned to the level (n + 1) of the processor No. 4 
may be linked to a task P assigned to the level n of the processor No. 3, while a task S assigned to the 
level (n + 1) of the processor No. 5 may be linked to a task Q assigned to the level m + 1) of the 
processor No. 5. The task linkings mentioned above are possible only when the inter-task sequential 
relation permits such moves of the tasks T and S. 

In the abovementioned case, the number of the processors to be used can be reduced from five to 
three. By this type task scheduling means 5, there can be generated at a high speed the task schedule 
which allows the abovementioned multi-processor system to perform the parallel processing with a high 
efficiency. The task scheduling means 5 may further be imparted with a task linking information generating 
function, as described below. 

Referring to Fig. 14. when a group of tasks including tasks (having task names or identifiers represented 
by symbols in circles), the task processing times (indicated by numerals affixed to the circles) and the inter- 
task sequential relation (indicated by task interconnecting arrows) are scheduled by the task scheduling 
means 5 described above, there can be obtained such results as illustrated in Fig. 16. Referring to the 
figure. TPTj represents the longest one of the processing times for the tasks belonging to the level i and 
TCTj represents the time involved in the parallel processing for the level i (such as overhead related to the 
synchronization, communication and others). Assuming now that the result of scheduling shown in Fig. 16 is 
to be executed, the overall processing time Ti is given by 



where n represents the number of the levels. Accordingly, in order to shorten the overall processing time T« 
to a minimum under given conditions for the tasks, the task processing times and the inter-task sequential 
relation, it is required to minimize 



n 




i=l 



(TPT i+ TCT i ) 




TL TPT . and 21 TCT , . 
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However, since the algorithm for minimizing 
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has already been executed (as described in conjunction with the scheduling algorithm), it is only necessary 
to minimize 



>rt, 

}_ TCT . 
i - f 



70 



To this end, either each of TCTj or the number of the levels may be decreased. More specifically, for the 
target machine 9, the algorithm for minimizing 



TPT ± 



T5 

has already been executed (in conjunction with the scheduling algorithm). Accordingly, it is required to 
minimize 



20 



2__ TCT . . 

i-l 1 

This can be achieved by decreasing either each 

•■n 

27 TCT . 

/■si 



or the number of the levels or synchronizations. In the target machine 9, the processing for synchronization 
is implemented in hardware with TCT t being minimized. Thus, by adopting a method of decreasing the level 
number n, 

2_ TCT. 
I* I 



is minimized to be as small as possible. In the target machine 9, since TCTj is substantially due to 
overhead related to the synchronization, the minimization of 

21 TCT, 

45 

can be achieved by decreasing the number n of synchronizations. Accordingly, those of the tasks existing 
at two or more levels of which linkage does not lead to increasing of 

K 

so * 2_ TPT * 



are linked together to decrease the number of synchronizations and hence 

55 
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to thereby shorten the overall processing time. Now, let's define a program parallel processing efficiency y 
as a ratio of the net task processing time 

(fZ TPT. ) 
I- 1 1 



to the overall processing time inclusive of the parallel processing overhead 



TCT . ). Namely, 



20 n 

TPT 
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n 



25 >_ (TPT. + TCT.) 
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It is thus apparent that the procedure described above contributes to enhancement of the parallel 
processing efficiency. Describing by taking as example the result of scheduling shown in Fig. 16. the levels 
1 and 2 are linked, whereby the tasks 1a and 1b are linked into a task 1 while the tasks 2a and 2b are 
linked into a task 2. Subsequently, the levels 3, 4 and 5 are linked, whereby the tasks 4a, 4b and 4c are 
linked into a task 4, while the tasks 3a and 3b are linked into a task 3. As a result of this, the tasks 1 to 4 
newly generated can be scheduled in such a manner as illustrated in Fig. 17. Consequently, when the 
processing is executed in accordance with the schedule shown in Fig. 17, the overall processing time T 2 
can be shortened by A T when compared with the processing time Ti shown in Fig. 16 to advantage. It 
should be noted that the task linkage mentioned above changes not only the task content and the task 
processing time but also the inter-task sequential relation. When the task linkages illustrated in Fig. 17 are 
performed, the inter-task sequential relation shown in Fig. 14 is changed to such a relation as shown in Fig. 
14. In this way. the task scheduling means 5 can generate task linkage information for the tasks which can 
diminish the overall processing time when linked. Of course, it should be understood that the task 
scheduling means 5 may also be so arranged as to perform only the scheduling without generating the task 
linkage information. Fig. 18 is a diagram for illustrating, by way of example, conditions for the inter-task 
relations and the task processing times which allow the overall processing time to be shortened by applying 
the task linkage scheme. Referring to Fig. 18 and assuming that the levels i and (i + 1) are to be linked, it 
can be seen that only one descendent task Bj exists at the level (i + 1) for a given task A J belonging to the 
level [ while the ascendent task at the level i for the descendent task Bj is only the task A r Accordingly, 
linkage of the tasks Aj and B, can decrease 



1 

55 without increasing 



i>1 

TL TCT i 



12 



EP 0 400 328 A1 



y tpt 



70 



75 



20 



25 



30 



35 



40 



45 



50 



55 



Thus, it is decided that the task Aj can be linked with the task Bj. Rg. 20 is a flow chart for illustrating an 
algorithm for making decision as to the possibility of the task linkage such as mentioned above. Referring to 
Rg. 20, the state in which the task scheduling by the task scheduling means 5 has been completed is 
represented by "START. Since the team synchronization can be regarded as the simultaneous synchro- 
nization performed internally of the team, this task linkage algorithm can be applied either to the scheduling 
based on only the simultaneous synchronization or the team synchronization. Further, in Rg. 20, with the 
statement "linking of task A with task B", it is to mean the generation of a task link file (i.e. file containing 
task linkage information) such as the one shown in Rg. 21. More specifically, Rg. 21 shows a task link file 
generated by applying the abovementioned algorithm to the result of scheduling shown in Rg. 16. Besides, 
a processor file corresponding to the result of scheduling shown in Rg. 16 is also generated by the task 
scheduling means 5. as shown in Rg. 24. In this figure, symbol So represents a simultaneous synchroniza- 
tion command. When the task link file shown in Rg. 21 is generated by the task scheduling means 5, the 
synchronization commands inserted between the linked tasks are deleted in the processor file shown in Rg. 
24 by referring to the task link file. Rg. 23 shows a processor file thus obtained, ft should however be 
understood that unless the task scheduling means 5 generates the task linking information, the synchroniza- 
tion information is not deleted. In Rg. 23, So indicates the simultaneous synchronization. When the tasks 
are executed in accordance with the processor file shown in Rg. 23, the results are then such as shown in 
Rg. 17, from which it can be seen that the processing time is shortened by A T when compared with the 
processing time before the task linking. 

As will be appreciated from the above description, the task scheduling means 5 receives as the inputs 
thereto information of the tasks, the task processing time and the inter-task relation to generate the result of 
task scheduling (block file), information about the tasks to be executed by the individual processors and the 
synchronization (processor file) and additionally the common variable table. Further, the task scheduling 
means 5 may generate the task linkage information (task link file). Among them, the processor file and the 
common variable table are supplied to the parallel compiling means 6 which in turn refers to the task 
registration table for translating the task synchronization instruction to the virtual machine code and 
additionally translating the virtual machine codes to the real machine codes with the aid of the virtual 
machine code - to - real machine code translation table. 

As the result of this, there is generated a file (object program) in which the real machine codes to be 
executed by each processor are arrayed in the order of execution. This file or object program is supplied to 
the target machine 9, which then performs the parallel processing in accordance with the object program 
supplied thereto. 

With the program parallelizing apparatus according to the instant (first) embodiment of the invention, lot 
of parts contained in the source program which are susceptible to the parallel processing can be 
parallelized by virtue of the division of the source program to the smaller units or tasks, wherein the tasks 
are parallelized such that the overall processing time involved in the parallel processing of the source 
program by the target machine can approach to the minimum processing time realizable under the given 
conditions for the tasks, the task processing times and the inter-task sequential relations, there can be 
obtained the object codes as well as the sequence capable of being processed in parallel with a high 
processing efficiency to great advantage. Besides, because of the use of the scheduling algorithm adopting 
the concept of the level in the scheduling means 5, the scheduling can be carried out at an extremely high 
speed to another advantage. It is further noted that in the case of the program parallelizing apparatus' 
according to the instant embodiment, the precompiling means 2, the task segmentation means 4 and the 
task scheduling means 5 are adapted to handle the virtual machine codes independently of the types of the 
processors of the multi-processor system. It is only the parallel compiling means 6 that is required to be 
modified in dependence on the real machine codes. Thus, the program parallelizing apparatus 10 according 
to the instant embodiment of the present invention can easily cope with the change or modification of the 
processors in the target machine 9 to further advantageous effect. 



Second Embodiment 

Rg. 26 is a schematic block diagram showing a general arrangement of the program parallelizing 
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apparatus according to a second embodiment of the invention. As can be seen from this figure, the program 
parallelizing apparatus 30 according to the instant embodiment comprises a precompiling means 23, a 
nucleus system routine 24. a task division means 25, a task scheduling means 27. 

A target machine 31 for executing the program parallelizing apparatus 30 may be similar to the target 
s machine 9 described hereinbefore in conjunction with the first embodiment of the invention. 

The program parallelizing apparatus 30 according to the instant embodiment is essentially same as that 
of the first embodiment so far as the precompiling of a source program 22 and the translation into virtual 
machine codes are concerned. The task division means 25 divides a group of virtual machine codes into 
individual tasks each corresponding to one virtual machine code to thereby generate a task registration 
io table and a task relation table by a method similar to that described hereinbefore in conjunction with the 
first embodiment and supplies information about the tasks, the task processing times and the inter-task 
relations to the task scheduling means 26, which in turn parallelizes the tasks on the basis of the 
information of the task, the task processing times and the inter-task relation supplied from the task division 
means 25 while maintaining the inter-task sequential relation so that the overall time involved in the parallel 
is processing of the given tasks by the target machine 31 including a plurality of processors can approach to 
the minimum processing time (processing time scaled along the critical path) which can be realized under 
the given conditions of tasks, task processing time and the inter-task sequential relation, as in the case of 
the first embodiment. 

Because the task scheduling means 26 adopts the method similar to that of the first embodiment, the 
20 tables and the files generated and outputted by this means 26 are similar to those of the first embodiment. 
More specifically, the task scheduling means 26 outputs task linkage information (i.e. a task link file). 
The task linkage information is supplied to the task division means 25 which regenerates the tasks on 
the basis of the task linkage information. In other words, those tasks which are indicated to be linked 
together in the task link file are linked into one task. For linking tasks GTt to T„ in this order into a task NT, 
25 there are performed procedures or manipulations (1) and (2) mentioned below. 

1) In the task registration table, the task contents (virtual machine codes) of the task NT are 
registered as OP-, OP 2 , .... OP n (where OPj is a virtual machine code of the task T,) while the task 
processing time TPT (NT) of the task NT is registered as 

TPT (NT) = XI TPT (T.) 

i=l 1 



where TPT (Tj) represents the time required for processing the task T ( . 
2) In the inter-task relation table: 

i) In case the inter-task relation is indicated by the ascendent tasks, those of AT(T-), AT(T;), .... 
AT(T n ) except Tt, T 2 , .... T„ are registered in the inter-task relation table as the ascendent tasks of the task 
NT. (Note that ATfTi) represents the ascendent task of the task T,.) 

ii) In case the inter-task relation is indicated by the descendent tasks, those of DTfT), DT(T 2 ), .... 
DT(T n ) except Tr, T 2 ....,T„ are registered in the inter-task relation table as the descendent tasks of the task 
NT. (Note that DT(Ti) represents the descendent task of the task Tj.) 

iii) In case the inter-task relation is indicated by both the ascendent tasks and the descendent 
tasks, those of ATfTi), AT(T 2 ) ATfTJ except Tt, T 2 T„ are registered in the inter-task relation table 

45 as the ascendent tasks of the task NT while those of DTfTi), DT(T 2 ), .... DTfTn) except T-. T 2 T n are 

registered in the inter-task relation table as the descendent tasks of the task NT. 

By modifying the task registration table and the inter-task relation table in the manner mentioned above, 
the tasks are regenerated, whereon the task processing time information and the inter-task relation 
information are sent to the task scheduling means 26 to be scheduled in accordance with algorithm similar 
to that described previously so that the overall processing time can approach to the critical path processing 
time as closely as possible. 

The task scheduling, the task linking and the task regeneration are repeated in the manner described 
above. At the time when the overall processing time inclusive of parallel processing overhead can no more 
be shortened, the abovementioned repetition is stopped, whereon the processor file and the common 
^ variable table generated by the task scheduling means 26 at that time point are supplied to the parallel 
compiling means 27. Subsequent operations inclusive of that of the parallel compiling means 27 are similar 
to those of the first embodiment. 

Advantageous effects brought about by the program parallelizing apparatus according to the instant 
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(second) embodiment are mentioned below. 

In general, when a source program is divided or segmented to a plurality of tasks which are then 
scheduled for allowing the parallel processing thereof so that the overall processing time can be diminished 
to a possible minimum, it is preferred to segment the source program into the task of as small a size as 

s possible to thereby increase the number of the tasks susceptible to the parallel processing, provided that 
overhead (such as synchronization overhead and others) involved in the parallel processing is null. 
However, in the actual parallel processing system, since overhead involved in the parallel processing is 
inevitable, segmentation of the source program into the tasks of excessively small size will increase the 
number of operations such as synchronization processings, communications and others which provide 

io causes for increasing overhead in the parallel processing, resulting in that the overall processing time 
inclusive of parallel processing overhead is increased, far from being reduced. However, by using the 
program parallelizing apparatus according to the instant embodiment of the invention in which the tasks are 
disassembled to minimum units to be scheduled for the parallelization, there can first be obtained the' 
optimal result of scheduling on the assumption that the parallel processing overhead is null. Then, the size 

is or magnitude (processing time) of each task is increased by linking and regeneration of the task so far as 
the overall processing time is not thereby increased on the assumption that the parallel processing 
overhead is null, which is then followed by repetition of scheduling for the parallelization. By virtue of this 
feature, the overall processing time inclusive of overhead involved in the parallel processing can signifi- 
cantly be reduced to great advantage. 

20 Further, in the program parallelizing apparatus according to the instant embodiment, it is only the 
parallel compiling means 27 that is required to be modified in accordance with the real machine codes. 
Consequently, the program parallelizing apparatus 30 according to the second embodiment of the invention 
can easily cope with changes of the processors in the target machine 31 . 

25 

Third Embodiment 

Fig. 27 is a diagram showing a third embodiment of the present invention. The program parallelizing 
apparatus 219 according to the instant embodiment comprises a control-oriented language compiler 202, a 

30 nucleus system routine (nucleus language) 206, a task divider 207, a task scheduler 208, and a parallel 
compiler 213 and may further include, as occasion requires, an operation-oriented language compiler A203, 
another operation-oriented language compiler B204 and an Al-oriented language compiler 205. Besides, a 
serial compiler 212 may further be provided. As the parallel compiler 213. there must be provided at least 
one of a complete parallel compiler, a vectoring compiler 215 and a distributed function compiler 216. 

35 A target machine 217 for executing the program paralleiization by the program parallelizing apparatus 
219 according to the instant embodiment may be similar to the target machine 9 of the first embodiment. 
The target machine 217 can be monitored with the aid of a system monitor 218 and a host monitor 211. 
The task scheduler 208 is described by using a scheduling language 209 oriented for process management 
and BASIC language 210. A source program described in a control-oriented language is translated into 

40 virtual machine codes by the control-oriented language compiler 202 in accordance with a virtual machine 
code prescribed by the nucleus system routine (nucleus language) 206. Similarly, a source program 
described in an operation-oriented language A is translated correspondingly by the operation-oriented 
language compiler A203, a source program written in an operation-oriented language B is translated by the 
operation-oriented language compiler B204 and a source program described in an Al-oriented language is 

45 translated by the Al-oriented language compiler 205 into the virtual machine codes in accordance with 
prescriptions of the nucleus system routine (nucleus language) 206, respectively, when there is absent a 
path 220 from the task scheduler 208 to the task divider 207, the latter operates similarly to the task division 
means 4 described hereinbefore in conjunction with the first embodiment, while the task scheduler 208 
operates similarly to the task scheduling means 5 of the first embodiment. Further, when the parallel 

so compiler 213 includes the complete parallelize compiler 214 and when the latter function is selected, the 
parallel compiler 21 3 operates similarly to the parallel compiling means 6 of the first embodiment. 

The object codes and the sequence now susceptible to the parallel processing with a high efficiency 
through the procedures and processings mentioned above are sent from the parallel compiler 213 to the 
target machine 217 where the parallel processing is executed. 

55 When the parallel compiler 213 includes the vectoring compiler 215 and when the function of the latter 
is selected, the processor file and the common variable table supplied from the task scheduler 208 to the 
parallel compiler 21 3 are translated to vectored real machine codes and sequence by the parallel compiler 
213 so that the target machine 217 can perform the vector processing correspondingly. 
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When the parallel compiler 213 includes the distributed function compiler 216 and when this function is 
selected, the parallel compiler 213 translates the processor file and the common variable table sent from 
the task scheduler 208 into real machine codes and sequence so that the distributed function processing 
can be performed by the target machine 217. The real machine codes and sequence resulting from the 
5 above translation are supplied to the target machine 217 which then performs the distributed function 
processing correspondingly. 

When the target machine is to perform a serial processing, the serial compiler 212 generates real 
machine codes and sequence oriented for serial processing on the basis of the processor file outputted for 
the serial processing from the task scheduler 208. The generated codes and sequence are then sent to the 
jo target machine. 

It should be mentioned that when the serial processing or the distributed function processing is 
performed, the task information and the inter-task relation information may be sent to the serial compiler 
212 and the parallel compiler 213 from the task divider 207 without the intervention of the task scheduler 
208. 

is On the other hand, when there exists the path 220 from the task scheduler 208 to the task divider 207, 
the latter can operate similarly to the task division means 25 of the second embodiment of the invention, 
white the task scheduler 208 operates similarly to the task scheduling means 26 of the second embodiment 
Other operations are substantially same as described above on the assumption that the path 220 is absent. 
The program parallelizing apparatus according to the instant embodiment can enjoy advantageous 

20 mentioned below in addition to those of the first and second embodiments. 

The program parallelizing apparatus can cope with a plurality of languages such as the control-oriented 
language, operation-oriented languages A and B, and the Al-oriented language as well as the serial 
processing performed by the target machine. Besides, the program parallelizing apparatus according to the 
instant embodiment allows the target machine to perform the vector processing and the distributed function 

25 processing. 



Fourth Embodiment 

30 Fig. 28 is a block diagram showing a general arrangement of the program parallelizing apparatus 
according to a fourth embodiment of the invention. This program parallelizing apparatus 70 includes an 
information processing unit 111 and a storage and is connected to a target machine denoted generally by 
73 through an interface 71. 

Programs describing the algorithm employed in the program parallelizing apparatus as described 

35 hereinbefore in conjunction with the first to third embodiments of the invention are stored in the storage or 
memory 112 incorporated in the program parallelizing apparatus 70. The object codes and sequence made 
susceptible to the parallel processing by the target machine 73 as the result of execution of algorithm by 
the information processing unit 111 are sent to the target machine 73 through an interface 71 and an 
externally extended channel 97. In the case of the instant embodiment, it is assumed that the target 

40 machine 73 is constituted by a parallel processing apparatus disclosed in JP-A-63-101957. 

By using the externally extended channel 97, the program parallelizing apparatus 70 can directly make 
access to shared memories 106 to 108, which in turn means that the object codes as well as the sequence 
(object program) generated by the program parallelizing can be sent to the shared memories 106 to 108. 
whereby the processors 77 to 82 can execute processings in parallel correspondingly under the control of 

45 parallel processing mechanisms 87 to 92, respectively. Communication among the processors 77 to 82 is 
realized by using the shared memories 106 to 108 connected individually to the processors 77 to 82 
through a bus to which signal control circuits 93 to 96 can make random access at a high speed with high 
efficiency, as described hereinbefore in conjunction with the first embodiment. Even when the number of 
the processors incorporated in the target machine 73 differs from the instant embodiment shown in Fig. 28. 

so similar operations and effects can be ensured. Each of the processors 77 to 82 includes a CPU (Central 
Processing Unit) 250, a CPU local memory 251 and a buffer 252. as shown in Fig. 29. Accordingly, the 
information of the object codes and sequences to be executed by the individual processors may be stored 
in the CPU local memories of the processors in place of the shared memories 106 and 107. It is further 
noted that by using externally extended channels 84 to 86, the object codes and sequence generated by 

55 the program parallelizing apparatus 709 according to the instant embodiment can be supplied to the target 
machine 73. 

The program parallelizing apparatus 70 according to the instant embodiment may be implemented by 
using, for example, a personal computer, a work station or the like singularly or in multiplicity. Further, a 
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single or plural ones of the processors incorporated in the target machine may be used as the program 
parallelizing apparatus 70. 

The program parallelizing apparatus according to the instant (fourth) embodiment of the invention 
provides advantage that object codes and sequence for allowing the target machine 73 to perform the 
5 parallel processing with high efficiency can be obtained in addition to those of the first to third embodi- 
ments. 

Claims 

70 

1. A program parallelizing apparatus for generating from a source program to be executed object 
programs susceptible to parallel processing by a multi-processor system (9) which includes a plurality of 
processors (77), communication means for allowing communication among said processors and synchroniz- 
ing means for allowing the processing to proceed in parallel among said processors through coordination 

75 (wait) processing, comprising: 

task division means (4, 25, 207, 111) for dividing the source program (1) to be executed into tasks to 
generate information of inter-task relation by checking operand data contained in said source program for 
thereby determining those tasks of which execution is influenced by data resulting from execution of given 
tasks or said given tasks of which execution influences execution of said those tasks and information of task 

20 processing time by adding the times required for processing machine instructions which constitute said 
tasks, respectively; 

task scheduling means (5, 26, 208, 111) responsive to said task processing time information and said inter- 
task relation information outputted from said task division means for thereby generating groups of tasks to 
be executed by said processors, respectively, as well as processing sequence information therefor such 
25 that the tasks capable of being executed in parallel in said multi-processor system (9) are executed 
separately by the different processors (77), respectively, without involving contradiction in the inter-task 
sequential relation and generating synchronization information indicating time points at which said synchro- 
nizing means should perform said coordination (wait) processing; and 

parallel compiling means (6, 27, 213, 111; 112) having a function for defining the tasks generated by said 
30 task scheduling means which tasks to be assigned to each of said processors of said multi-processor 
system and said processing sequence information therefor as object programs to be executed by the 
individual processors, respectively, in said multi-processor system and assigning said object programs to 
said processors, respectively. 

2. A program parallelizing apparatus according to claim 1, wherein said task scheduling means (26) is 
35 imparted with a function for generating task link information for linking into one task those tasks undergone 

the task scheduling which can decrease, when linked together, the number of times said coordination (wait) 
processing is to be performed by said synchronization means and hence the time required for executing 
the parallel processing in said multi-processor system (9). 

3. A program parallelizing apparatus according to claim 2. wherein said task scheduling means (26) is 
40 imparted with a function for linking the tasks in accordance with said task link information. 

4. A program parallelizing apparatus according to claim 2, wherein said task division means (25) is 
imparted with a function for linking the tasks to thereby generate a new task in accordance with the task link 
information generated by said task scheduling means (25). 

5. A program parallelizing apparatus according to claim 1 , further including precompiling means (2. 23, 
45 202, 203; 204; 205) for translating the source program (1) into a pre-object program including virtual 

machine codes expressing individual instructions for various processors in general terms. 

6. A program parallelizing apparatus according to claim 2, further including precompiling means (2, 23, 
202, 203; 204; 205) for translating the source program (1) into a pre-object program including virtual 
machine codes expressing individual instructions for various processors in general terms. 

so 7. A program parallelizing apparatus according to claim 3, further including precompiling means (2, 23, 
202, 203; 204; 205) for translating the source program (1) into a pre-object program including virtual 
machine codes expressing individual instructions for various processors in general terms. 

8. A program parallelizing apparatus according to claim 4, further including precompiling means (2, 23, 
202, 203; 204; 205) for translating the source program (1) into a pre-object program including virtual 

55 machine codes expressing individual instructions for various processors in general terms. 

9. A program parallelizing apparatus according to claim 5, further including virtual machine codes which 
can be translated in one-to-one correspondence into data expressed in the form of corresponding functions. 

10. A program parallelizing apparatus according to claim 6, further including virtual machine codes 
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which can be translated in one-to-one correspondence into data expressed in the form of corresponding 
functions. 

11. A program parallelizing apparatus according to claim 7, further including virtual machine codes 
which can be translated in one-to-one correspondence into data expressed in the form of corresponding 

s functions. 

12. A program parallelizing apparatus according to claim 8, further including virtual machine codes 
which can be translated in one-to-one correspondence into data expressed in the form of corresponding 
functions. 
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